Colombo Amenity Distribution: Point Pattern Analysis & Spatial Regression

Author

Nimesh Akalanka

Published

February 26, 2026

Introduction

This notebook explores the spatial distribution of educational and healthcare facilities across the Colombo district in Sri Lanka. The analysis is structured into two main sections: Point Pattern Analysis and Spatial Regression Analysis.

User Stories

User Story 1: Urban Planner

Priya Jayawardena | Age 34 | Urban Planner
Urban Development Authority (UDA), Colombo, Sri Lanka

Responsibilities

  • Develop and review spatial development plans for Colombo district at the GND level.
  • Identify gaps in public facility coverage and recommend locations for new infrastructure investments.
  • Coordinate with the Ministry of Education to align school and kindergarten placement with population growth patterns.
  • Prepare evidence-based reports to support budget allocation decisions for educational infrastructure.

Experiences

  • Has extensive experience in facility gap analysis and spatial planning for public services.
  • Familiar with GIS-based tools for mapping and analyzing urban service coverage.
  • Previously worked on school rationalization projects identifying overcrowded vs. underutilized facilities.

Challenges

  • Lacks fine-grained spatial data on how educational facilities are distributed relative to population density at the GND level.
  • Difficult to justify new facility placements to stakeholders without quantitative evidence of clustering or underservice.
  • Existing planning reports rely on administrative counts rather than per-capita or density-adjusted measures, which misrepresent actual access gaps.

Goals

  • Identify which GNDs in Colombo are statistically underserved by educational facilities on a per-capita basis.
  • Use spatial regression findings to understand what infrastructure and demographic factors drive the current uneven distribution of schools and kindergartens.
  • Produce a prioritized list of GNDs for new educational facility investment backed by spatial evidence.

1 Data Loading & Preparation

These steps include the preparation of spatial data for the analysis of Colombo District. The task involves loading relevant datasets, converting the coordinate system to EPSG:32644 for proper dimension and area calculations, and merging demographic data with building information. The goal is to calculate attributes related to population demographics, housing, road infrastructure, and building area.

  • Population Groups:
    • Population_0_14: Number of people aged 0-14.
    • Population_15_59: Number of people aged 15-59.
    • Population_60_64: Number of people aged 60-64.
    • Population_Above_65: Number of people aged above 65.
    • Male: Total number of males.
    • Female: Total number of females.
    • Total_Population: Total number of people (Male + Female).
  • Housing and Employment:
    • Occupied_Housing_Units: Number of occupied housing units.
    • Poverty_Percentage: Percentage of the population living in poverty.
    • Unemployment_Rate: Unemployment rate in the area.
  • Infrastructure:
    • Road_Density: Density of roads in the area.
    • Road_Length: Total length of roads in the area.
  • Geospatial Calculations:
    • Population_Density: Population density per square kilometer.
    • Building_Area: Area of buildings (units: square meters)

Furthermore all the datasets are clipped in to the Colombo district boundary to ensure that the analysis is focused on the relevant geographic area.

# Load necessary libraries
library(sf)
library(dplyr)
library(tidyr)
library(ggplot2)
library(tmap)
library(terra)
library(spatstat)
library(spdep)
library(spatialreg)
library(knitr)
library(kableExtra)
systemfonts and textshaping have been compiled with different versions of Freetype. Because of this, textshaping will not use the font cache provided by systemfonts
library(patchwork)

# Improve figure clarity in rendered output
knitr::opts_chunk$set(
  fig.align = "center",
  fig.width = 9,
  fig.height = 6,
  dpi = 180,
  out.width = "100%"
)

# Select an projected coordinate system and set up the output directory
utm_crs <- 32644
output_dir <- "output"
dir.create(output_dir, recursive = TRUE, showWarnings = FALSE)
gpkg_path <- file.path(output_dir, "output_layers.gpkg")

# Load Sri Lankan GND boundaries
sl_gnds <- st_read("data/sl_gnds.shp", quiet = TRUE) %>%
  st_make_valid() %>%
  st_transform(utm_crs)

# Consider only Colombo district for this analysis
names(sl_gnds)
[1] "PROVINCE_N" "DISTRICT_N" "DSD_N"      "GND_N"      "geometry"  
unique(sl_gnds$DISTRICT_N)
 [1] "Kandy"        "Matale"       "Nuwara Eliya" "Ampara"       "Batticaloa"  
 [6] "Trincomalee"  "Anuradhapura" "Polonnaruwa"  "Kurunegala"   "Puttalam"    
[11] "Jaffna"       "Kilinochchi"  "Mannar"       "Mullaitivu"   "Vavuniya"    
[16] "Kegalle"      "Ratnapura"    "Galle"        "Hambantota"   "Matara"      
[21] "Badulla"      "Moneragala"   "Colombo"      "Gampaha"      "Kalutara"    
cmb_gnds <- sl_gnds %>% filter(DISTRICT_N == "Colombo")
cmb_boundary <- st_union(cmb_gnds)

# Read the census population data from census csv file
cmb_census <- read.csv("data/cmb_census_2024.csv")

# Join the census population data to the Colombo GNDs
# Consider both the GND and DSD names
cmb_gnds <- cmb_gnds %>%
  mutate(
    GND_N = trimws(toupper(GND_N)),
    DSD_N = trimws(toupper(DSD_N))
  ) %>%
  left_join(
    cmb_census %>%
      mutate(
        GND_Name = trimws(toupper(GND_Name)),
        DSD_Name = trimws(toupper(DSD_Name)),
        AG_0_14 = as.numeric(gsub(",", "", trimws(AG_0_14))),       
        AG_15_59 = as.numeric(gsub(",", "", trimws(AG_15_59))),
        AG_60_64 = as.numeric(gsub(",", "", trimws(AG_60_64))),
        AG_65_and_above = as.numeric(gsub(",", "", trimws(AG_65_and_above))),
        Male = as.numeric(gsub(",", "", trimws(Male))),
        Female = as.numeric(gsub(",", "", trimws(Female))),
        Total = as.numeric(gsub(",", "", trimws(Total))),
        Occupied_Housing_Units = as.numeric(gsub(",", "", trimws(Occupied_Housing_Units))),
        Poverty_Percentage = as.numeric(gsub(",", "", trimws(Poverty_Percentage))),
        Unemployment_Rate = as.numeric(gsub(",", "", trimws(Unemployment_Rate))),
        Road_Density = as.numeric(gsub(",", "", trimws(Road_Density))),
        Road_Length = as.numeric(gsub(",", "", trimws(Road_Length)))
      ) %>%
      select(
        DSD_Name, GND_Name, AG_0_14, AG_15_59, AG_60_64, AG_65_and_above, Male, Female, Total,
        Occupied_Housing_Units, Poverty_Percentage, Unemployment_Rate, Road_Density, Road_Length
      ),
    by = c("DSD_N" = "DSD_Name", "GND_N" = "GND_Name")
  ) %>%
  mutate(
    GND_N = tools::toTitleCase(tolower(GND_N)),
    DSD_N = tools::toTitleCase(tolower(DSD_N)),
    Area_sqkm = as.numeric(st_area(geometry)) / 1e6,           
    Pop_density_sqkm = Total / Area_sqkm
  )

# Check how many GNDs were successfully matched
cat("GNDs with population data:", sum(!is.na(cmb_gnds$Total)), "of", nrow(cmb_gnds), "\n")
GNDs with population data: 557 of 557 
# Load Sri Lankan Building footprints
cmb_build <- st_read("data/sl_buildings.shp", quiet = TRUE) %>%
  st_transform(utm_crs) %>% 
  st_intersection(cmb_boundary)

# Calculate total building footprint area per GND (Unit: square meters)
building_area_by_gnd <- cmb_build %>%
  mutate(building_area = as.numeric(st_area(.))) %>% 
  st_join(cmb_gnds %>% select(GND_N)) %>%  
  st_drop_geometry() %>%  
  group_by(GND_N) %>% 
  summarise(Building_area = sum(building_area, na.rm = TRUE), .groups = "drop")

# Join building area back to cmb_gnds 
cmb_gnds <- cmb_gnds %>%
  left_join(building_area_by_gnd, by = "GND_N") %>%
  mutate(Building_area = coalesce(Building_area, 0))

# Save the file to GeoPackage
st_write(cmb_build, gpkg_path, layer = "cmb_build", delete_layer = TRUE, quiet = TRUE)

# Show the information about the loaded buildings and POI data
cat("Number of GNDs loaded:", nrow(cmb_gnds), "\n")
Number of GNDs loaded: 557 

In this study mainly 02 main amenity categories are considered in the OSM dataset under the fclass attribute:

  • Education Facilities:
    • school
    • university
    • college
    • kindergarten
  • Healthcare Facilites:
    • hospital
    • clinic
    • pharmacy
    • doctors
    • dentist
    • veterinary
    • nursing_home

Then other POI categories are filtered out and only the above two categories are retained for the analysis.

# Import Colombo POI data as points
cmb_poi <- st_read("data/sl_pois.shp", quiet = TRUE) %>%
  st_transform(utm_crs) %>%
  st_intersection(cmb_boundary)

# Inspect the POI attribute names
print(names(cmb_poi))
[1] "osm_id"   "code"     "fclass"   "name"     "geometry"
# Classify POIs based on fclass
cmb_poi <- cmb_poi %>%
  mutate(
    amenity_type = case_when(
      fclass %in% c("school", "university", "college", "kindergarten") ~ "education", 
      fclass %in% c("hospital", "clinic", "pharmacy", "doctors", "dentist", "veterinary", "nursing_home") ~ "healthcare",
      TRUE ~ NA_character_
    )) %>% filter(!is.na(amenity_type))

# Perform a spatial join to associate GND_N with POIs
cmb_poi_with_gnd <- st_join(cmb_poi, cmb_gnds %>% select(GND_N))

# Check if the GND_N is correctly added to the POI data
print(names(cmb_poi_with_gnd)) 
[1] "osm_id"       "code"         "fclass"       "name"         "amenity_type"
[6] "GND_N"        "geometry"    
head(cmb_poi_with_gnd)
Simple feature collection with 6 features and 6 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 374626.5 ymin: 750124.3 xmax: 386507 ymax: 762313.2
Projected CRS: WGS 84 / UTM zone 44N
     osm_id code     fclass                                  name amenity_type
1 191815840 2110   hospital               Asiri Surgical Hospital   healthcare
2 253022237 2110   hospital       Joseph Fraser Memorial Hospital   healthcare
3 254367261 2110   hospital             Wethara District Hospital   healthcare
4 255179653 2110   hospital Sri Jayawardenapura teaching hospital   healthcare
5 255670316 2081 university     University of Sri Jayewardenepura    education
6 256542591 2082     school   Somaweera Chandrasiri Junior School    education
                GND_N                  geometry
1         Narahenpita POINT (376228.3 762236.9)
2     Thimbirigasyaya POINT (374626.5 762313.2)
3             Wethara   POINT (386507 750124.3)
4     Thalapathpitiya POINT (381352.6 759326.6)
5 Gangodavila South b   POINT (379121 757704.9)
6         Mampe North POINT (380988.8 752143.1)
# Count education and healthcare POIs per GND
poi_counts_by_gnd <- cmb_poi_with_gnd %>%
  st_drop_geometry() %>%
  group_by(GND_N, amenity_type) %>%
  summarise(count = n(), .groups = "drop") %>%
  pivot_wider(names_from = amenity_type, values_from = count, values_fill = list(count = 0))


# Keep only final count columns for a stable join schema
poi_counts_by_gnd <- poi_counts_by_gnd %>%
  transmute(
    GND_N,
    education_count = as.integer(education),
    healthcare_count = as.integer(healthcare)
  )

# Check the resulting table
print(names(poi_counts_by_gnd))
[1] "GND_N"            "education_count"  "healthcare_count"
print(head(poi_counts_by_gnd))
# A tibble: 6 × 3
  GND_N          education_count healthcare_count
  <chr>                    <int>            <int>
1 Akaravita                    1                0
2 Aluth Ambalama               1                0
3 Aluthkade East               2                0
4 Aluthmawatha                 5                1
5 Asiri Uyana                  0                4
6 Athurugiriya                 1                1
# Join the POI category counts with density, count per 1000 people
cmb_gnds <- cmb_gnds %>%
  select(-any_of(c("education", "healthcare", "education_count", "healthcare_count",
                   "education_density", "healthcare_density",
                   "education_per1000", "healthcare_per1000"))) %>%
  left_join(poi_counts_by_gnd, by = "GND_N") %>%
  mutate(
    education_count    = coalesce(education_count, 0L),
    healthcare_count   = coalesce(healthcare_count, 0L),
    education_density  = education_count / Area_sqkm,
    healthcare_density = healthcare_count / Area_sqkm,
    education_per1000  = (education_count / Total) * 1000,
    healthcare_per1000 = (healthcare_count / Total) * 1000
  )

# Save the classified POIs to GeoPackage
st_write(cmb_poi, gpkg_path, layer = "cmb_poi", delete_layer = TRUE, quiet = TRUE)
st_write(cmb_gnds, gpkg_path, layer = "cmb_gnds", delete_layer = TRUE, quiet = TRUE)

2 Visualize the POI Distribution

Visualize the two amenity categories (educational and healthcare) across Colombo.

# Split the data based on amenity categories
education <- cmb_poi %>% filter(amenity_type == "education")
healthcare <- cmb_poi %>% filter(amenity_type == "healthcare")

# Define a function to create maps
make_map <- function(pts, title, dot_color) {
  ggplot() +
    geom_sf(data = cmb_gnds, fill = "#f7f7f7", color = "#d9d9d9", linewidth = 0.25) +
    geom_sf(data = st_union(cmb_gnds), fill = NA, color = "#333333", linewidth = 0.6) +
    geom_sf(
      data  = pts,
      color = dot_color,
      size  = 1.7,
      alpha = 0.75,
      shape = 16
    ) +
    labs(
      title = title,
      subtitle = paste0("n = ", nrow(pts), " points")
    ) +
    theme_void(base_size = 12) +
    theme(
      plot.title      = element_text(
        face   = "bold",
        hjust  = 0,
        size   = 14,
        margin = margin(b = 3)
      ),
      plot.subtitle   = element_text(size = 10, color = "#5f5f5f", hjust = 0, margin = margin(b = 8)),
      plot.background = element_rect(fill = "white", color = NA),
      plot.margin     = margin(10, 10, 10, 10)
    ) +
    coord_sf(expand = FALSE)
}

# Plot the map
m1 <- make_map(education, "Educational Facilities", "#2166ac")
m2 <- make_map(healthcare,  "Healthcare Facilities",  "#d6604d")

m1 / m2 +
  plot_annotation(
    title   = "Amenity Distribution across Colombo District",
    caption = "Source: OpenStreetMap | CRS: UTM Zone 44N",
    theme   = theme(
      plot.title   = element_text(face = "bold", size = 16, hjust = 0.5),
      plot.caption = element_text(color = "grey50", size = 9, hjust = 1)
    )
  )


3 Amenity Distribution Analysis

Are the educational, healthcare facilities in Colombo distributed randomly, clustered or regularly distributed? Since spatstat requries a observation window, Colombo boundary is used as the observation window.

3.1 Quadrat Count Analysis

library(viridis)
# Defining the observation window for the amenity categories
education_ppp <- as.ppp(education, W = as.owin(cmb_boundary))
healthcare_ppp <- as.ppp(healthcare, W = as.owin(cmb_boundary))

# Test the quadrant count for each amenity
education_quadrant <- quadratcount(education_ppp, nx = 10, ny = 10)
healthcare_quadrant <- quadratcount(healthcare_ppp, nx = 10, ny = 10)

# Plot the point distribution in the quardrants
op <- par(no.readonly = TRUE)
par(mfrow = c(2, 1), mar = c(3.8, 3.8, 3, 1.5), bg = "white")

plot(education_quadrant, main = "Educational Facilities | Quadrat Count", col = viridis(10), legend = TRUE, main.cex = 1.2)
plot(st_geometry(education), add = TRUE, col = adjustcolor("grey70", alpha.f = 0.4), pch = 16)
plot(as.owin(cmb_boundary), add = TRUE, border = "grey20", lwd = 1)

plot(healthcare_quadrant, main = "Healthcare Facilities | Quadrat Count", col = viridis(10),legend = TRUE, main.cex = 1.2)
plot(st_geometry(healthcare), add = TRUE, col = adjustcolor("grey70", alpha.f = 0.4), pch = 16)
plot(as.owin(cmb_boundary), add = TRUE, border = "grey20", lwd = 1)

par(op)

3.2 Chi-squared Test for Complete Spatial Randomness (CSR)

Let’s do a chi-squared test to see if the observed distribution of amenities in the quadrants follow a complete spatial randomness (CSR) pattern or do they show clustering or regularity.

# Test the quadrat counts for each amenity category
quadrat.test(education_quadrant)

    Chi-squared test of CSR using quadrat counts

data:  
X2 = 2078.4, df = 77, p-value < 2.2e-16
alternative hypothesis: two.sided

Quadrats: 78 tiles (irregular windows)

p-value is less than 0.05 (if we consider a significance of 5%), reject the null hypothesis of complete spatial randomness (CSR) for educational facilities, indicating that they are not randomly distributed across Colombo. The observed distribution is likely to be clustered or regular. The magnitude of the test statistic constitute strong evidence against randomness, indicates an extremely uneven distribution, where educational facilities are heavily concentrated in a small number of quadrats while large portions of Colombo have very few or none at all. This points to significant spatial clustering of educational facilities across the district.

# Test the quadrat counts for each amenity category
quadrat.test(healthcare_quadrant)

    Chi-squared test of CSR using quadrat counts

data:  
X2 = 808.99, df = 77, p-value < 2.2e-16
alternative hypothesis: two.sided

Quadrats: 78 tiles (irregular windows)

The p-value is less than 0.05 (at a 5% significance level), we reject the null hypothesis of Complete Spatial Randomness (CSR) for healthcare facilities, indicating that they are not randomly distributed across Colombo. The observed distribution is likely to be clustered or regular. Although the test statistic (X² = 808.99) is considerably smaller than that of educational facilities, it still constitutes strong evidence against randomness, indicating an uneven distribution where healthcare facilities are concentrated in certain quadrats while other parts of Colombo remain comparatively underserved. This points to significant spatial clustering of healthcare facilities across the district.

3.3 Density Analysis

Perform a kernel density estimation to visualize the intensity of the given amenity categories see their visual differences in terms of spatial distribution across Colombo.

# Calculate the kernel density
education_dens <- density(education_ppp, sigma = bw.diggle)
healthcare_dens <- density(healthcare_ppp, sigma = bw.diggle)

# Plot the kernel density
op <- par(no.readonly = TRUE)
par(mfrow = c(2, 1), mar = c(3.8, 3.8, 3, 1.5), bg = "white")

plot(education_dens, main = "Educational Facilities KDE", col = viridis(100))
plot(as.owin(cmb_boundary), add = TRUE, border = "grey20", lwd = 1)

plot(healthcare_dens, main = "Healthcare Facilities KDE", col = viridis(100))
plot(as.owin(cmb_boundary), add = TRUE, border = "grey20", lwd = 1)

par(op)

# Plot the GND wise Density Map
p1 <- ggplot() +
  geom_sf(data = cmb_gnds, aes(fill = education_density),
          color = "#d9d9d9", linewidth = 0.15) +
  geom_sf(data = st_union(cmb_gnds), fill = NA,
          color = "#333333", linewidth = 0.6) +
  scale_fill_viridis_c(
    option    = "viridis",
    name      = "Facilities\nper km²",
    na.value  = "grey90",
    trans     = "sqrt"
  ) +
  labs(
    title    = "Educational Facility Density",
    subtitle = "Facilities per km² by GND"
  ) +
  theme_void(base_size = 12) +
  theme(
    plot.title      = element_text(face = "bold", size = 14, hjust = 0,
                                   margin = margin(b = 3)),
    plot.subtitle   = element_text(size = 10, color = "#5f5f5f", hjust = 0,
                                   margin = margin(b = 8)),
    plot.background = element_rect(fill = "white", color = NA),
    plot.margin     = margin(10, 10, 10, 10),
    legend.position = "right"
  ) +
  coord_sf(expand = FALSE)

p2 <- ggplot() +
  geom_sf(data = cmb_gnds, aes(fill = healthcare_density),
          color = "#d9d9d9", linewidth = 0.15) +
  geom_sf(data = st_union(cmb_gnds), fill = NA,
          color = "#333333", linewidth = 0.6) +
  scale_fill_viridis_c(
    option    = "viridis",
    name      = "Facilities\nper km²",
    na.value  = "grey90",
    trans     = "sqrt"
  ) +
  labs(
    title    = "Healthcare Facility Density",
    subtitle = "Facilities per km² by GND"
  ) +
  theme_void(base_size = 12) +
  theme(
    plot.title      = element_text(face = "bold", size = 14, hjust = 0,
                                   margin = margin(b = 3)),
    plot.subtitle   = element_text(size = 10, color = "#5f5f5f", hjust = 0,
                                   margin = margin(b = 8)),
    plot.background = element_rect(fill = "white", color = NA),
    plot.margin     = margin(10, 10, 10, 10),
    legend.position = "right"
  ) +
  coord_sf(expand = FALSE)

p1 / p2 +
  plot_annotation(
    title   = "GND wise Facility Density across Colombo District",
    caption = "Source: OpenStreetMap | Area-adjusted counts | CRS: UTM Zone 44N",
    theme   = theme(
      plot.title   = element_text(face = "bold", size = 16, hjust = 0.5),
      plot.caption = element_text(color = "grey50", size = 9, hjust = 1)
    )
  )

3.4 Selecting an Appropriate Correlation Method

Let’s see is the distribution of educational and healthcare facilities are similar accross Colombo district. First check that what type of distribution the KDE values of both categories follw.

Code
set.seed(13244)

# Convert spatstat density objects to terra rasters
education_rast  <- rast(education_dens)
healthcare_rast <- rast(healthcare_dens)

# Resample healthcare raster to match education raster grid exactly
healthcare_rast_resampled <- resample(healthcare_rast, education_rast)

# Extract raster cell values
education_vals  <- values(education_rast)
healthcare_vals <- values(healthcare_rast_resampled)

# Remove NA cells (cells outside the observation window)
valid <- !is.na(education_vals) & !is.na(healthcare_vals)
cat("Valid raster cells used:", sum(valid), "\n")
Valid raster cells used: 8968 
Code
# Shapiro-Wilk requires n <= 5000
sample_idx <- sample(which(valid), min(5000, sum(valid)))

# Plot the histogram to visualize the distribution of intensity values for both categories
hist(education_vals[valid],
     breaks = 50,
     main   = "Education KDE",
     xlab   = "Intensity",
     col    = "#2166ac80",
     border = "white")

Code
# Test on raw values of education
shapiro_edu <- shapiro.test(education_vals[sample_idx])
cat("Education KDE:    W =", round(shapiro_edu$statistic, 4), "| p =", shapiro_edu$p.value, "\n")
Education KDE:    W = 0.3716 | p = 3.614476e-85 

Since the p-value of the Shapiro-Wilk test is 3.614476e-85, which is significantly less than 0.05, null hypothesis of normality distribution can be rejected. This indicates that the distribution of educational facility intensity across Colombo significantly deviates from normality, suggesting a non-normal distribution.

Code
# Plot the histogram
hist(healthcare_vals[valid],
     breaks = 50,
     main   = "Healthcare KDE",
     xlab   = "Intensity",
     col    = "#d6604d80",
     border = "white")

Code
# Shapiro Wilk test for the KDE values of healthcare facilities
shapiro_hlth <- shapiro.test(healthcare_vals[sample_idx])
cat("Healthcare KDE:   W =", round(shapiro_hlth$statistic, 4), "| p =", shapiro_hlth$p.value, "\n")
Healthcare KDE:   W = 0.2859 | p = 4.798521e-88 

Also the healthcare KDE values have p-value of p = 4.798521e-88, which is significantly less than 0.05, which says that the healthcare facility intensity distribution across Colombo also significantly deviates from normality, indicating a non-normal distribution.

Both educational and healthcare facilities show non-normal distributions in their spatial intensity across the district. Therefore using the Pearson R correlation coefficient to measure the relationship between the two categories may not be appropriate.

3.5 Correlation Analysis

Since both educational and healthcare facility intensity distributions are confirmed to be non-normal, Spearman’s rank correlation is used as the primary measure of spatial association. The analysis is conducted at two spatial scales to provide a comprehensive understanding of the relationship between the two amenity types across Colombo.

Code
# Pixel wise KDE Correlation
# Spearman's rank correlation test
spearman_kde <- cor.test(
  education_vals[valid],
  healthcare_vals[valid],
  method = "spearman"
)

# GND wise KDE Correlation
# Spearman's rank correlation test
spearman_gnd <- cor.test(
  cmb_gnds$education_density,
  cmb_gnds$healthcare_density,
  method = "spearman"
)

# Summary of the correlation results
cat("  - Pixel wise Spearman Correlation rho:", round(spearman_kde$estimate, 4), "\n")
  - Pixel wise Spearman Correlation rho: 0.5544 
Code
cat("  - GND wise Spearman Correlation rho:", round(spearman_gnd$estimate, 4), "\n")
  - GND wise Spearman Correlation rho: 0.3229 
Scale rho Strength
Pixel wise KDE 0.5544 Moderate positive
GND wise Density 0.3229 Moderate positive

At the pixel level, there is a moderate positive correlation between the intensity of education and healthcare facilities, this reflect that areas with higher concentration of educational facilities also tend to have higher concentration of healthcare facilities as well.

Additionally, when the data is aggregated at the GND level, the correlation is also moderate positive, which suggests that when looking at GND level densities, there is still tendency for GNDs with higher education facilities also tend to have higher healthcare facilities as well. However the correlation is weaker at GND level compared to pixel level.


4 Spatial Regression Analysis

I want to understand the factors that influence the distribution of education and healthcare facilities across Colombo district in terms of GND level attributes.

4.1 Data Exploration and Preparation

First see the distribution of education and healthcare facilities per 1000 people across GNDs

# Define break points and labels
edu_breaks <- c(0, 0.001, 0.5, 1.0, 2.0, max(cmb_gnds$education_per1000, na.rm = TRUE))
hlt_breaks <- c(0, 0.001, 0.5, 1.0, 2.0, max(cmb_gnds$healthcare_per1000, na.rm = TRUE))

# Plot the GND wise map for education and healthcare facilities per 1000 people
p1 <- ggplot() +
  geom_sf(data = cmb_gnds, aes(fill = education_per1000),
          color = "#d9d9d9", linewidth = 0.15) +
  geom_sf(data = st_union(cmb_gnds), fill = NA,
          color = "#333333", linewidth = 0.6) +
  scale_fill_stepsn(
    colours  = RColorBrewer::brewer.pal(5, "Blues"),
    breaks   = c(0.001, 0.5, 1.0, 2.0),
    limits   = c(0, max(cmb_gnds$education_per1000, na.rm = TRUE)),
    labels   = c("Very Low (<0.5)", "Low (0.5–1)", "Medium (1–2)", "High (>2)"),
    name     = "Education Facilities\nper 1000 people",
    na.value = "grey90",
    guide    = guide_coloursteps(show.limits = FALSE)
  ) +
  labs(
    title    = "Education Facilities per 1000 People",
    subtitle = "Education facilities per 1000 people by GND"
  ) +
  theme_void(base_size = 12) +
  theme(
    plot.title      = element_text(face = "bold", size = 14, hjust = 0,
                                   margin = margin(b = 3)),
    plot.subtitle   = element_text(size = 10, color = "#5f5f5f", hjust = 0,
                                   margin = margin(b = 8)),
    plot.background = element_rect(fill = "white", color = NA),
    plot.margin     = margin(10, 10, 10, 10),
    legend.position = "right"
  ) +
  coord_sf(expand = FALSE)

p2 <- ggplot() +
  geom_sf(data = cmb_gnds, aes(fill = healthcare_per1000),
          color = "#d9d9d9", linewidth = 0.15) +
  geom_sf(data = st_union(cmb_gnds), fill = NA,
          color = "#333333", linewidth = 0.6) +
  scale_fill_stepsn(
    colours  = RColorBrewer::brewer.pal(5, "Reds"),
    breaks   = c(0.001, 0.5, 1.0, 2.0),
    limits   = c(0, max(cmb_gnds$healthcare_per1000, na.rm = TRUE)),
    labels   = c("Very Low (<0.5)", "Low (0.5–1)", "Medium (1–2)", "High (>2)"),
    name     = "Healthcare Facilities\nper 1000 people",
    na.value = "grey90",
    guide    = guide_coloursteps(show.limits = FALSE)
  ) +
  labs(
    title    = "Healthcare Facilities per 1000 People",
    subtitle = "Healthcare facilities per 1000 people by GND"
  ) +
  theme_void(base_size = 12) +
  theme(
    plot.title      = element_text(face = "bold", size = 14, hjust = 0,
                                   margin = margin(b = 3)),
    plot.subtitle   = element_text(size = 10, color = "#5f5f5f", hjust = 0,
                                   margin = margin(b = 8)),
    plot.background = element_rect(fill = "white", color = NA),
    plot.margin     = margin(10, 10, 10, 10),
    legend.position = "right"
  ) +
  coord_sf(expand = FALSE)

p1 / p2 +
  plot_annotation(
    title   = "Amenity Accessibility across Colombo District",
    caption = "Source: OpenStreetMap & Census 2024 | CRS: UTM Zone 44N",
    theme   = theme(
      plot.title   = element_text(face = "bold", size = 16, hjust = 0.5),
      plot.caption = element_text(color = "grey50", size = 9, hjust = 1)
    )
  )

Let’s prepare regression datasets for both educational and healthcare facilities. Since spatial models assume linearity log transformations are applied to skewed data such as population density, road density and occupied housing units.

Additionally, demographic percentages are calculated to capture the age structure of the population and also for the built up area, which can influence the demand for the type of amenities this study considers.

# Prepare the regression dataset
cmb_reg <- cmb_gnds %>%
  filter(
    !is.na(Total), Total > 0,
    !is.na(Pop_density_sqkm),
    !is.na(Poverty_Percentage),
    !is.na(Unemployment_Rate),
    !is.na(Road_Density),
    !is.na(Building_area),
    !is.na(Occupied_Housing_Units)
  ) %>%
  mutate(
    log_pop_density = log1p(Pop_density_sqkm),
    log_road_density = log1p(Road_Density),
    log_occ_housing = log1p(Occupied_Housing_Units),
    pct_builtup_area = (Building_area / Area_sqkm * 100),
    pct_builtup_area = scale(pct_builtup_area)[,1],
    pct_young = (AG_0_14 / Total) * 100,
    pct_elderly = (AG_65_and_above / Total * 100),
    pct_working_age = ((AG_15_59 + AG_60_64) / Total * 100)
  )

# Check the number of GNDs in the Regression dataset
cat("GNDs for regression:", nrow(cmb_reg), "\n")
GNDs for regression: 556 

4.2 Build Spatial Weights Matrix

To analyze the spatial relationships, definition of spatial weights matrix is needed. In this study a queen contiguity-based spatial weights matrix is considered because this is a administrative boundary which any of the adjust administrative units can be considered as a neighbor and influence each other.

Code
# Create a spatial weights matrix using queen contiguity
nb_queen <- poly2nb(cmb_reg, queen = TRUE)
lw_queen <- nb2listw(nb_queen, style = "W", zero.policy = TRUE)

# Check the neighbour structure
cat(
  "Total GNDs:", length(nb_queen), "\n",
  "Average number of neighbours:", round(mean(card(nb_queen)), 2), "\n",
  "GNDs with no neighbours:", sum(card(nb_queen) == 0), "\n",
  "Min neighbours:", min(card(nb_queen)), "\n",
  "Max neighbours:", max(card(nb_queen)), "\n"
)
Total GNDs: 556 
 Average number of neighbours: 5.58 
 GNDs with no neighbours: 0 
 Min neighbours: 1 
 Max neighbours: 13 

This spatial weights matrix contains 556 GNDs with an average of 5.58 neighbours per GND. There is no GND with zero neighbours, which means all GNDs have at least one adjacent GND. The maximum number of neighbours is 13.

4.3 Define Spatial Regression Formulas

The 02 formulas for spatial regression models are defined with same set of independent variables to analyze the factors influencing the distribution of education and healthcare facilities across Colombo district.

# Formular for education facilities per 1000 people
formula_education <- formula(
  education_per1000 ~ log_pop_density + Poverty_Percentage +
    Unemployment_Rate + log_road_density + log_occ_housing + pct_builtup_area + 
    pct_young + pct_elderly
)

# Formular for healthcare facilities per 1000 people
formula_healthcare <- formula(
  healthcare_per1000 ~ log_pop_density + Poverty_Percentage +
    Unemployment_Rate + log_road_density + log_occ_housing + pct_builtup_area + 
    pct_young + pct_elderly
)

4.4 OLS Baseline Models

First, OLS regression models are fitted for both educational and healthcare facilities to establish a baseline understanding of the relationships between the independent variables and the dependent variables without considering the spatial autocorrelation.

Code
# OLS regression models for both education and healthcare facilities
ols_education <- lm(formula_education, data = cmb_reg)
ols_healthcare <- lm(formula_healthcare, data = cmb_reg)

# Display the summary of the OLS models
summary(ols_education)

Call:
lm(formula = formula_education, data = cmb_reg)

Residuals:
    Min      1Q  Median      3Q     Max 
-0.7911 -0.1801 -0.0663  0.0615  3.6620 

Coefficients:
                    Estimate Std. Error t value Pr(>|t|)    
(Intercept)        -0.237418   0.332010  -0.715 0.474857    
log_pop_density    -0.100105   0.025994  -3.851 0.000132 ***
Poverty_Percentage -0.005566   0.006600  -0.843 0.399381    
Unemployment_Rate  -0.027688   0.020233  -1.368 0.171721    
log_road_density    0.121649   0.029300   4.152 3.83e-05 ***
log_occ_housing     0.064911   0.035163   1.846 0.065429 .  
pct_builtup_area    0.098181   0.020402   4.812 1.93e-06 ***
pct_young           0.020296   0.009490   2.139 0.032910 *  
pct_elderly         0.026996   0.008022   3.365 0.000818 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.375 on 547 degrees of freedom
Multiple R-squared:  0.1503,    Adjusted R-squared:  0.1378 
F-statistic: 12.09 on 8 and 547 DF,  p-value: 5.724e-16
Code
summary(ols_healthcare)

Call:
lm(formula = formula_healthcare, data = cmb_reg)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.66111 -0.15761 -0.04949  0.06864  2.67199 

Coefficients:
                    Estimate Std. Error t value Pr(>|t|)    
(Intercept)         1.693721   0.283507   5.974 4.17e-09 ***
log_pop_density    -0.075233   0.022197  -3.389 0.000751 ***
Poverty_Percentage  0.003195   0.005635   0.567 0.571011    
Unemployment_Rate  -0.027336   0.017277  -1.582 0.114173    
log_road_density    0.121061   0.025020   4.839 1.70e-06 ***
log_occ_housing    -0.056503   0.030026  -1.882 0.060392 .  
pct_builtup_area    0.055878   0.017422   3.207 0.001418 ** 
pct_young          -0.035467   0.008104  -4.377 1.44e-05 ***
pct_elderly        -0.011858   0.006850  -1.731 0.083996 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.3202 on 547 degrees of freedom
Multiple R-squared:  0.2056,    Adjusted R-squared:  0.194 
F-statistic:  17.7 on 8 and 547 DF,  p-value: < 2.2e-16

OLS baseline model shows that for education facilities per 1000 people these factors considered in the formular accounts for only 15%, for the healthcare facilities per 1000 people the R-squared value is 20.56, which means the model only explains about 20.56% of the changes in healthcare facilities distribution across GNDs.

4.5 Moran’s I Test on OLS Residuals

Run the Moran’s I test on the residuals of the OLS models to see if there is any spatial autocorrelation present. If there is significant spatial autocorrelation in the residuals, it shows that the OLS model doesnt fully capture the spatial processes underline with the distribution of the education and healthcare amenities.

# Moran's I test on OLS residuals to check for spatial autocorrelation
moran_education <- lm.morantest(ols_education, lw_queen, zero.policy = TRUE)
moran_healthcare <- lm.morantest(ols_healthcare, lw_queen, zero.policy = TRUE)

# Display the results of Moran's I test
moran_education

    Global Moran I for regression residuals

data:  
model: lm(formula = formula_education, data = cmb_reg)
weights: lw_queen

Moran I statistic standard deviate = 10.002, p-value < 2.2e-16
alternative hypothesis: greater
sample estimates:
Observed Moran I      Expectation         Variance 
    0.2503638865    -0.0079900879     0.0006671923 
moran_healthcare

    Global Moran I for regression residuals

data:  
model: lm(formula = formula_healthcare, data = cmb_reg)
weights: lw_queen

Moran I statistic standard deviate = 3.1514, p-value = 0.0008126
alternative hypothesis: greater
sample estimates:
Observed Moran I      Expectation         Variance 
    0.0734097026    -0.0079900879     0.0006671923 

The Moran’s I test results shows significant spatial autocorrelation in the education facilities per 1000 people value of 0.25, which tell that there is a unexplained variation in the distribution of education facilities across GNDs.

Healthcare facilities show week clustering Moran’s I = 0.073, p < 0.001 confirming that even though the spatial structure is weaker, it is not negligible and cannot be attributed to chance.

In both cases, the null hypothesis of spatial randomness is rejected, meaning OLS assumptions are violated and spatial regression models are necessary for reliable inference.

4.6 Lagrange Multiplier Diagnostics

Code
lm_diag_education <- lm.LMtests(
  ols_education, lw_queen,
  test = c("LMlag", "LMerr", "RLMlag", "RLMerr"),
  zero.policy = TRUE
)

lm_diag_healthcare <- lm.LMtests(
  ols_healthcare, lw_queen,
  test = c("LMlag", "LMerr", "RLMlag", "RLMerr"),
  zero.policy = TRUE
)

lm_diag_education

    Rao's score (a.k.a Lagrange multiplier) diagnostics for spatial
    dependence

data:  
model: lm(formula = formula_education, data = cmb_reg)
test weights: listw

RSlag = 82.552, df = 1, p-value < 2.2e-16


    Rao's score (a.k.a Lagrange multiplier) diagnostics for spatial
    dependence

data:  
model: lm(formula = formula_education, data = cmb_reg)
test weights: listw

RSerr = 90.572, df = 1, p-value < 2.2e-16


    Rao's score (a.k.a Lagrange multiplier) diagnostics for spatial
    dependence

data:  
model: lm(formula = formula_education, data = cmb_reg)
test weights: listw

adjRSlag = 0.017371, df = 1, p-value = 0.8951


    Rao's score (a.k.a Lagrange multiplier) diagnostics for spatial
    dependence

data:  
model: lm(formula = formula_education, data = cmb_reg)
test weights: listw

adjRSerr = 8.0371, df = 1, p-value = 0.004583
Code
lm_diag_healthcare

    Rao's score (a.k.a Lagrange multiplier) diagnostics for spatial
    dependence

data:  
model: lm(formula = formula_healthcare, data = cmb_reg)
test weights: listw

RSlag = 3.4042, df = 1, p-value = 0.06503


    Rao's score (a.k.a Lagrange multiplier) diagnostics for spatial
    dependence

data:  
model: lm(formula = formula_healthcare, data = cmb_reg)
test weights: listw

RSerr = 7.7867, df = 1, p-value = 0.005263


    Rao's score (a.k.a Lagrange multiplier) diagnostics for spatial
    dependence

data:  
model: lm(formula = formula_healthcare, data = cmb_reg)
test weights: listw

adjRSlag = 3.7554, df = 1, p-value = 0.05264


    Rao's score (a.k.a Lagrange multiplier) diagnostics for spatial
    dependence

data:  
model: lm(formula = formula_healthcare, data = cmb_reg)
test weights: listw

adjRSerr = 8.138, df = 1, p-value = 0.004335

The robust Lagrange Multiplier diagnostics indicate that for both facility types the spatial dependence operates through the error term. The robust spatial error test is significant for both education (adjRSerr = 8.037, p = 0.005) and healthcare (adjRSerr = 8.138, p = 0.004), while the robust spatial lag test is not significant in either case. A Spatial Error Model is therefore selected for both outcomes, suggesting the spatial clustering is driven by unmeasured background factors influencing the distribution of amenities rather than direct spillover effects between GNDs.

4.7 Spatial Error Models

Since the residual autocorrelation is positive for both models, a spatial error model is fitted to account for spatial autocorrelation in the error terms, which is likely due to unobserved spatially structured factors influencing the distribution of education and healthcare facilities across GNDs.

Code
sem_education <- errorsarlm(
  formula_education,
  data        = cmb_reg,
  listw       = lw_queen,
  zero.policy = TRUE
)

sem_healthcare <- errorsarlm(
  formula_healthcare,
  data        = cmb_reg,
  listw       = lw_queen,
  zero.policy = TRUE
)

summary(sem_education)

Call:errorsarlm(formula = formula_education, data = cmb_reg, listw = lw_queen, 
    zero.policy = TRUE)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.820641 -0.159433 -0.051457  0.060284  3.576406 

Type: error 
Coefficients: (asymptotic standard errors) 
                     Estimate Std. Error z value  Pr(>|z|)
(Intercept)        -0.3311486  0.3342088 -0.9908 0.3217621
log_pop_density    -0.0786423  0.0275889 -2.8505 0.0043650
Poverty_Percentage -0.0064335  0.0078839 -0.8160 0.4144858
Unemployment_Rate  -0.0013160  0.0216732 -0.0607 0.9515832
log_road_density    0.0981049  0.0267494  3.6676 0.0002449
log_occ_housing     0.0553318  0.0339417  1.6302 0.1030589
pct_builtup_area    0.1315490  0.0235414  5.5880 2.297e-08
pct_young           0.0125422  0.0097254  1.2896 0.1971765
pct_elderly         0.0291283  0.0084875  3.4319 0.0005993

Lambda: 0.45249, LR test value: 69.254, p-value: < 2.22e-16
Asymptotic standard error: 0.05217
    z-value: 8.6733, p-value: < 2.22e-16
Wald statistic: 75.227, p-value: < 2.22e-16

Log likelihood: -204.4203 for error model
ML residual variance (sigma squared): 0.11694, (sigma: 0.34196)
Number of observations: 556 
Number of parameters estimated: 11 
AIC: 430.84, (AIC for lm: 498.09)
Code
summary(sem_healthcare)

Call:
errorsarlm(formula = formula_healthcare, data = cmb_reg, listw = lw_queen, 
    zero.policy = TRUE)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.728531 -0.155783 -0.048381  0.057515  2.590605 

Type: error 
Coefficients: (asymptotic standard errors) 
                     Estimate Std. Error z value  Pr(>|z|)
(Intercept)         1.7755645  0.2902770  6.1168 9.548e-10
log_pop_density    -0.0729141  0.0231772 -3.1459 0.0016555
Poverty_Percentage  0.0045219  0.0061405  0.7364 0.4614889
Unemployment_Rate  -0.0197889  0.0181508 -1.0902 0.2756039
log_road_density    0.1044536  0.0246594  4.2358 2.277e-05
log_occ_housing    -0.0532194  0.0303939 -1.7510 0.0799478
pct_builtup_area    0.0695134  0.0188775  3.6824 0.0002311
pct_young          -0.0429953  0.0083979 -5.1197 3.060e-07
pct_elderly        -0.0133251  0.0071515 -1.8633 0.0624257

Lambda: 0.19021, LR test value: 7.9367, p-value: 0.0048441
Asymptotic standard error: 0.063169
    z-value: 3.0112, p-value: 0.0026023
Wald statistic: 9.0673, p-value: 0.0026023

Log likelihood: -147.2707 for error model
ML residual variance (sigma squared): 0.098756, (sigma: 0.31426)
Number of observations: 556 
Number of parameters estimated: 11 
AIC: 316.54, (AIC for lm: 322.48)

Education SEM: Road accessibility and built-up intensity are the strongest drivers of education facility distribution, with better connected and more urbanised GNDs having significantly more facilities. Denser GNDs are relatively underserved on a per-capita basis. The strong spatial error parameter (λ = 0.452) indicates that unmeasured spatially structured factors account for a significant share of the clustering pattern.

Healthcare SEM: Road accessibility and built-up intensity similarly drive healthcare facility distribution. However, the most striking finding is that GNDs with higher proportions of young residents are significantly underserved in healthcare, suggesting a demographic mismatch in provision. The weaker spatial error parameter (λ = 0.190) reflects that healthcare distribution is somewhat less spatially structured than education, consistent with the weaker Moran’s I found earlier.

Across both models, population density shows a consistent negative relationship, denser GNDs tend to have fewer facilities per capita, pointing to a potential equity concern in Colombo’s most densely populated areas.

4.8 Validate SEM Residuals

Code
# Morans'I test on SEM residuals to check if spatial autocorrelation is adequately accounted for
moran_sem_edu <- moran.test(residuals(sem_education), lw_queen, zero.policy = TRUE)
moran_sem_hlt <- moran.test(residuals(sem_healthcare), lw_queen, zero.policy = TRUE)

moran_sem_edu

    Moran I test under randomisation

data:  residuals(sem_education)  
weights: lw_queen    

Moran I statistic standard deviate = -0.98914, p-value = 0.8387
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance 
     -0.026867835      -0.001801802       0.000642183 
Code
moran_sem_hlt

    Moran I test under randomisation

data:  residuals(sem_healthcare)  
weights: lw_queen    

Moran I statistic standard deviate = -0.17518, p-value = 0.5695
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance 
    -0.0063130091     -0.0018018018      0.0006631888 

The Moran’s I test on SEM residuals confirms that both models have successfully removed the spatial autocorrelation present in the OLS residuals. Education SEM residuals show a Moran’s I of -0.027 (p = 0.839) and healthcare SEM residuals show -0.006 (p = 0.570), both well above the 0.05 significance threshold. The slightly negative Moran’s I values are ideal, they indicate the residuals are now pure noise with no remaining spatial pattern, confirming that the Spatial Error Models have adequately captured the underlying spatial structure in both facility distributions.

4.9 Plot Residual Map

Code
# Add SEM residual to the regression dataset for mapping
cmb_reg <- cmb_reg %>%
  mutate(
    resid_edu = residuals(sem_education),
    resid_hlt = residuals(sem_healthcare)
  )

# Plot the Map
m1 <- ggplot() +
  geom_sf(data = cmb_reg, aes(fill = resid_edu),
          color = "#d9d9d9", linewidth = 0.15) +
  geom_sf(data = st_union(cmb_reg), fill = NA,
          color = "#333333", linewidth = 0.6) +
  scale_fill_distiller(
    palette = "RdBu",
    name = "Residual",
    direction = 1,
    limits = c(-max(abs(cmb_reg$resid_edu), na.rm = TRUE), max(abs(cmb_reg$resid_edu), na.rm = TRUE))
  ) +
  labs(
    title = "Education per 1000 - SEM Residuals",
    subtitle = "No spatial pattern = model is valid"
  ) +
  theme_void(base_size = 12) +
  theme(
    plot.title = element_text(face = "bold", size = 13, hjust = 0),
    plot.subtitle = element_text(size = 10, color = "#5f5f5f", hjust = 0),
    plot.background = element_rect(fill = "white", color = NA)
  )

m2 <- ggplot() +
  geom_sf(data = cmb_reg, aes(fill = resid_hlt),
          color = "#d9d9d9", linewidth = 0.15) +
  geom_sf(data = st_union(cmb_reg), fill = NA,
          color = "#333333", linewidth = 0.6) +
  scale_fill_distiller(
    palette = "RdBu",
    name = "Residual",
    direction = 1,
    limits = c(-max(abs(cmb_reg$resid_hlt), na.rm = TRUE), max(abs(cmb_reg$resid_hlt), na.rm = TRUE))
  ) +
  labs(
    title = "Healthcare per 1000 - SEM Residuals",
    subtitle = "No spatial pattern = model is valid"
  ) +
  theme_void(base_size = 12) +
  theme(
    plot.title = element_text(face = "bold", size = 13, hjust = 0),
    plot.subtitle = element_text(size = 10, color = "#5f5f5f", hjust = 0),
    plot.background = element_rect(fill = "white", color = NA)
  )

m1 / m2 +
  plot_annotation(
    title = "Spatial Model Residual Maps, Colombo District",
    caption = "Source: OpenStreetMap & Census 2024 | CRS: UTM Zone 44N",
    theme = theme(
      plot.title   = element_text(face = "bold", size = 16, hjust = 0.5),
      plot.caption = element_text(color = "grey50", size = 9, hjust = 1)
    )
  )

Education map: Almost entirely white/near-zero residuals across the district with only a couple of isolated blue GNDs. No geographic clustering pattern is visible, residuals are randomly scattered.

Healthcare map: Similarly random mix of very light red and blue with no systematic geographic pattern. The values are small and spread without any clear directional clustering.

Both maps visually confirm what the Moran’s I validation already told us statistically, the spatial structure has been fully captured by the models.